This is a self-contained data analysis report for REC-22-436_REST_ECO_MANUSCRIPT_REV1.

The main hypothesis of this study were evaluated in consideration of daytime pumping station and artificial habitat use, which were

  1. Roach will occupy pumping station when artificial habitat is absent

  2. When introduced (pre-exclusion), roach will show equal preference towards artificial habitat and pumping station

  3. When excluded from the pumping station, artificial habitat occupancy will increase

  4. Roach will show a preferential change (compared to pre-exclusion) to occupying artificial habitat when the pumping station is reintroduced (post-exclusion)

  5. Artificial habitat occupancy will be higher in sheltered (vs unsheltered) treatments

Required libraries
library(tidyverse)
library(knitr)
library(kableExtra)
library(funModeling)
library(dlookr)
library(glmmTMB)
library(ggeffects)
library(DHARMa)
library(ggpubr)

1. Load & prepare data

roach_wide.csv - wide data set where fish counts are stored in columns c_hab, c_open, c_ps

roach_wide=read_csv("./data/roach_wide.csv")

1.1 Data discovery

Determine dataset structure by checking number of rows, columns and data types (i.e., numerical, factors). Identify categorical factors and their levels.

nrow(roach_wide)
## [1] 6942
ncol(roach_wide)
## [1] 24
colnames(roach_wide)
##  [1] "date"          "time"          "run"           "trial"        
##  [5] "treatment"     "ps_tank_end"   "room_end"      "tank"         
##  [9] "light"         "sequence"      "day"           "seq_day"      
## [13] "hab_avail"     "ps_avail"      "both_avail"    "hours_havail" 
## [17] "hours_lout"    "c_hab"         "c_open_screen" "c_open_cent"  
## [21] "c_open_hab"    "c_open"        "c_ps"          "c_both"

funModeling:df_status - For each variable it returns: Quantity and percentage of zeros (q_zeros and p_zeros respectively). Same metrics for NA values (q_NA/p_na), and infinite values (q_inf/p_inf). Last two columns indicates data type and quantity of unique values.

roach_wide_status<- funModeling::df_status(roach_wide, print_results = F)
variable q_zeros p_zeros q_na p_na q_inf p_inf type unique
date 0 0.00 0 0.00 0 0 character 52
time 288 4.15 0 0.00 0 0 hms-difftime 24
run 0 0.00 0 0.00 0 0 numeric 4
trial 0 0.00 0 0.00 0 0 numeric 24
treatment 3471 50.00 0 0.00 0 0 numeric 2
ps_tank_end 3468 49.96 0 0.00 0 0 numeric 2
room_end 3471 50.00 0 0.00 0 0 numeric 2
tank 0 0.00 0 0.00 0 0 numeric 6
light 2616 37.68 0 0.00 0 0 numeric 2
sequence 0 0.00 0 0.00 0 0 numeric 4
day 0 0.00 0 0.00 0 0 numeric 13
seq_day 0 0.00 0 0.00 0 0 numeric 52
hab_avail 5214 75.11 0 0.00 0 0 numeric 2
ps_avail 5166 74.42 0 0.00 0 0 numeric 2
both_avail 3504 50.48 0 0.00 0 0 numeric 2
hours_havail 0 0.00 3516 50.65 0 0 numeric 72
hours_lout 288 4.15 2334 33.62 0 0 numeric 16
c_hab 3201 46.11 0 0.00 0 0 numeric 13
c_open_screen 5683 81.86 0 0.00 0 0 numeric 13
c_open_cent 5331 76.79 0 0.00 0 0 numeric 13
c_open_hab 4510 64.97 0 0.00 0 0 numeric 13
c_open 3732 53.76 0 0.00 0 0 numeric 13
c_ps 3724 53.64 0 0.00 0 0 numeric 13
c_both 599 8.63 0 0.00 0 0 numeric 13

Several variables need to be converted to factors and labels added to levels.

1.1.1 Set factors and add labels

Create a lookup table for variable labels.

labels_table <- list(
  treatment = c('Covered (B)','Uncovered (A)'),
  ps_tank_end = c('RH', 'LH'),
  room_end = c('Far', 'Near'),
  light = c('Day', 'Night'),
  hab_avail = c('AH unavailable', 'AH available'),
  ps_avail = c('PS available','PS unavailable'),
  both_avail = c('Single Available', 'Both Available'),
  sequence = c('Baseline', 'I 1', 'I 2', 'I 3')
)

Convert variables to factors with specific labeling using a loop. Use the lookup table to get the labels for each variable.

for (var in names(labels_table)) {
  levels <- unique(roach_wide[[var]])
  labels <- labels_table[[var]]
  roach_wide[[var]] <- factor(roach_wide[[var]], levels = levels, labels = labels)
}

Convert other variables to factors without adding labels.

other_vars <- c("hours_havail", "hours_lout", "run", "day", "trial")
roach_wide[other_vars] <- lapply(roach_wide[other_vars], as.factor)

1.2 Data diagnosis

1.2.1 NAs

Using the roach_wide_status dataframe consider variables with NA values.

roach_na <- roach_wide_status %>%
  filter(q_na > 0) %>%
  arrange(-p_na) %>%
  select(variable, q_na, p_na)
variable q_na p_na
hours_havail 3516 50.65
hours_lout 2334 33.62

Variables with NA not considered in main analysis, NAs can be omitted for visual analysis.

1.2.2 0’s

Using the roach_wide_status dataframe consider the spread of zeros in habitat count variables

roach_0 <- roach_wide_status %>%
  filter(variable %in% c("c_hab", "c_ps", "c_open", "c_open_cent", "c_open_screen", "c_open_hab")) %>%
  arrange(-p_zeros) %>%
  select(variable, q_zeros, p_zeros)
variable q_zeros p_zeros
c_open_screen 5683 81.86
c_open_cent 5331 76.79
c_open_hab 4510 64.97
c_open 3732 53.76
c_ps 3724 53.64
c_hab 3201 46.11

High proportion of 0’s in all counts, but will be confounded without consideration for light period and habitat availability. Regardless, the presence of high 0’s will significantly skew variability and make raw counts hard to interpret. Descriptive statistics would be limited by high IQR and misleading min/max values. Consider rescaling count variables for analysis.

1.2.3 Outliers

Check the main dataframe for outliers.

dlookr::plot_outlier - for each variable specified the function plots outlier information for numerical data diagnosis

dlookr::plot_outlier(roach_wide, "c_hab", "c_ps", "c_open")

1.2.4 Normality

Check the main dataframe for data distribution.

dlookr::plot_normality - for each variable specified the function determines normality by plotting histogram and q-q plot of the original data, log transformed, and square root transformed.

dlookr::plot_normality(roach_wide, "c_hab", "c_ps", "c_open")

1.3 Rescale count data

The code performs a two-step transformation: it first normalizes raw fish count data to a 0-1 scale, and then ensures that the normalized values have a small positive offset to prevent division by zero issues for modelling.

roach_wide <- roach_wide %>%
  mutate(across(c(c_hab, c_ps, c_open), ~ . / max(.), .names = "{.col}_normalized"),
         across(ends_with("_normalized"), ~ ifelse(. == 0, . + 0.0000000001, .)))

1.4 Summarise

First, create minimal dataset with variables of interest.

roach_wide_sum = select(roach_wide, c_hab, c_ps, c_open, c_hab_normalized, c_ps_normalized, c_open_normalized, sequence, light, treatment)

funModeling::profile_num - Provides an expanded summary including mean, standard deviation, skewness and kurtosis.

Skewness >0 = right-skew, <0 = left-skew, 0 = symmetric. kurtosis >3 = sharp, <3 = flat, 3 = normal

numeric_prof <-funModeling::profiling_num(roach_wide_sum)
variable mean std_dev variation_coef p_01 p_05 p_25 p_50 p_75 p_95 p_99 skewness kurtosis iqr range_98 range_80
c_hab 4.1780467 4.7238987 1.130648 0 0 0 2.0000000 8.0000000 12 12 0.6180048 1.782165 8.0000000 [0, 12] [0, 12]
c_ps 4.3388073 5.4305629 1.251626 0 0 0 0.0000000 12.0000000 12 12 0.6121211 1.476341 12.0000000 [0, 12] [0, 12]
c_open 3.4462691 4.3204959 1.253673 0 0 0 0.0000000 7.0000000 12 12 0.8004510 2.112379 7.0000000 [0, 12] [0, 11]
c_hab_normalized 0.3481706 0.3936582 1.130648 0 0 0 0.1666667 0.6666667 1 1 0.6180048 1.782165 0.6666667 [1e-10, 1] [1e-10, 1]
c_ps_normalized 0.3615673 0.4525469 1.251626 0 0 0 0.0000000 1.0000000 1 1 0.6121211 1.476341 1.0000000 [1e-10, 1] [1e-10, 1]
c_open_normalized 0.2871891 0.3600413 1.253673 0 0 0 0.0000000 0.5833333 1 1 0.8004510 2.112379 0.5833333 [1e-10, 1] [1e-10, 0.916666666666667]

Standard deviations of raw counts are high, as expected from normality plots.

Examine raw count descriptive summary before considering rescaled values.

count_sum <- bind_rows(
  roach_wide_sum %>%
    group_by(light) %>%
    summarise(sum_c_hab = sum(c_hab),
              sum_c_ps = sum(c_ps),
              sum_c_open = sum(c_open)) %>%
    rename(variable = light),
  roach_wide_sum %>%
    group_by(sequence) %>%
    summarise(sum_c_hab = sum(c_hab),
              sum_c_ps = sum(c_ps),
              sum_c_open = sum(c_open)) %>%
    rename(variable = sequence),
  roach_wide_sum %>%
    summarise(sum_c_hab = sum(c_hab),
              sum_c_ps = sum(c_ps),
              sum_c_open = sum(c_open),
              variable = "Total") %>%
    mutate(variable = factor(variable)))
variable sum_c_hab sum_c_ps sum_c_open
Light period
Day 15011 13426 2790
Night 13993 16694 21134
Experimental sequence
Baseline 0 15014 6283
I 1 4483 13129 3041
I 2 13272 0 7327
I 3 11249 1977 7273
Total count
Total 29004 30120 23924

Examine rescaled counts

rescaled_mean <- bind_rows(
  roach_wide_sum %>%
    group_by(light) %>%
    summarise(mean_c_hab = mean(c_hab_normalized),
              mean_c_ps = mean(c_ps_normalized),
              mean_c_open = mean(c_open_normalized)) %>%
    rename(variable = light),
  roach_wide_sum %>%
    group_by(sequence) %>%
    summarise(mean_c_hab = mean(c_hab_normalized),
              mean_c_ps = mean(c_ps_normalized),
              mean_c_open = mean(c_open_normalized)) %>%
    rename(variable = sequence),
    roach_wide_sum %>% filter(light=="Day")%>%
    group_by(treatment) %>%
    summarise(mean_c_hab = mean(c_hab_normalized),
              mean_c_ps = mean(c_ps_normalized),
              mean_c_open = mean(c_open_normalized)) %>%
    rename(variable = treatment))
variable mean_c_hab mean_c_ps mean_c_open
Light period
Day 0.4781792 0.4276886 0.0888761
Night 0.2695523 0.3215827 0.4071120
Experimental sequence
Baseline 0.0000000 0.7044857 0.2948104
I 1 0.2161941 0.6331501 0.1466532
I 2 0.6400463 0.0000000 0.3533468
I 3 0.5481969 0.0963450 0.3544347
Treatment (daytime)
Covered (B) 0.4968782 0.4083206 0.0899592
Uncovered (A) 0.4594801 0.4470566 0.0877931

2 Visualise main relationships

Apply custom themeing for ggplot2 objects.

theme_JN <- function(base_size=12){ 
  theme_grey() %+replace%
    theme(
      axis.text = element_text(colour="black"),
      axis.title = element_text(colour="black"),
      axis.ticks = element_line(colour="black"),
      panel.border = element_rect(colour = "black", fill=NA),
      panel.grid.major = element_blank(),
      panel.grid.minor = element_blank(),
      panel.background = element_blank(),
      strip.background = element_rect(colour = "black",fill = NA),
      panel.spacing.x = unit(12, "pt")
    ) 
}

Plot raw count data first to confirm suitability of rescaling data.

ggplot(roach_wide_sum, aes(x = sequence, y= c_ps)) +
  geom_boxplot() +theme_JN()+theme(text=element_text(size=20))

ggplot(roach_wide_sum, aes(x = sequence, y= c_hab)) +
  geom_boxplot()+theme_JN()+theme(text=element_text(size=20))

ggplot(roach_wide_sum, aes(x = sequence, y= c_open)) +
  geom_boxplot()+theme_JN()+theme(text=element_text(size=20))

The boxplots confirm raw count data are unsuitable for analysis due to large variation in counts presenting large IQR + outliers. Descriptive data e.g., medians are therefore hard to interpret and generalise.

From here on all analysis considers rescaled counts.

#Artificial habitat,light
ggplot(roach_wide_sum %>% filter(sequence!="Baseline")%>%
         group_by(light) %>%
         summarise(mean_c_hab = mean(c_hab_normalized),
                   se_c_hab = sd(c_hab_normalized) / sqrt(n())),
       aes(x = light, y = mean_c_hab)) +
  geom_point(size = 3) +
  geom_errorbar(aes(ymin = mean_c_hab - se_c_hab, ymax = mean_c_hab + se_c_hab),
                width = 0.2)+theme_JN()+theme(text=element_text(size=20))

#pumping station, light
ggplot(roach_wide_sum %>% filter(sequence!="I 2")%>%
         group_by(light) %>%
         summarise(mean_c_hab = mean(c_ps_normalized),
                   se_c_hab = sd(c_ps_normalized) / sqrt(n())),
       aes(x = light, y = mean_c_hab)) +
  geom_point(size = 3) +
  geom_errorbar(aes(ymin = mean_c_hab - se_c_hab, ymax = mean_c_hab + se_c_hab),
                width = 0.2)+theme_JN()+theme(text=element_text(size=20))

#open water, light
ggplot(roach_wide_sum %>%
         group_by(light) %>%
         summarise(mean_c_hab = mean(c_open_normalized),
                   se_c_hab = sd(c_open_normalized) / sqrt(n())),
       aes(x = light, y = mean_c_hab)) +
  geom_point(size = 3) +
  geom_errorbar(aes(ymin = mean_c_hab - se_c_hab, ymax = mean_c_hab + se_c_hab),
                width = 0.2)+theme_JN()+theme(text=element_text(size=20))

#Artificial habitat, sequence, light
ggplot(roach_wide_sum %>% filter(sequence!="Baseline")%>%
         group_by(sequence, light) %>%
         summarise(mean_c_hab = mean(c_hab_normalized),
                   se_c_hab = sd(c_hab_normalized) / sqrt(n())),
       aes(x = sequence, y = mean_c_hab, colour = light)) +
  geom_point(size = 3) +
  geom_errorbar(aes(ymin = mean_c_hab - se_c_hab, ymax = mean_c_hab + se_c_hab),
                width = 0.2)+theme_JN()+theme(text=element_text(size=20))

#pumping station, sequence, light
ggplot(roach_wide_sum %>% filter(sequence!="I 2")%>%
         group_by(sequence, light) %>%
         summarise(mean_c_hab = mean(c_ps_normalized),
                   se_c_hab = sd(c_ps_normalized) / sqrt(n())),
       aes(x = sequence, y = mean_c_hab, colour = light)) +
  geom_point(size = 3) +
  geom_errorbar(aes(ymin = mean_c_hab - se_c_hab, ymax = mean_c_hab + se_c_hab),
                width = 0.2)+theme_JN()+theme(text=element_text(size=20))

#open water, sequence, light
ggplot(roach_wide_sum %>%
         group_by(sequence, light) %>%
         summarise(mean_c_hab = mean(c_open_normalized),
                   se_c_hab = sd(c_open_normalized) / sqrt(n())),
       aes(x = sequence, y = mean_c_hab, colour = light)) +
  geom_point(size = 3) +
  geom_errorbar(aes(ymin = mean_c_hab - se_c_hab, ymax = mean_c_hab + se_c_hab),
                width = 0.2)+theme_JN()+theme(text=element_text(size=20))

By plotting the main visual relationships we’re able to quickly identify relationships in the data and begin addressing study objectives and hypothesis.

Here, we can consider that the day/night relationship in habitat occupancy was relatively fixed throughout the experiment in all habitat options. We see that open water occupancy was decreased in intervention 1, possibly attributed to introducing artificial habitat.

  1. H1 was supported, fish occupied the pumping station during the day
  2. H2 was partially supported, there was no equal preference but high variation in fish behavior
  3. H3 supported, habitat occupancy increased in artificial habitat when excluded from pumping station
  4. H4 supported, habitat occupancy was highest in artificial habitat post-exclusion. Preferential change demonstrated

To consider H5, let’s examine treatment effects.

#Artificial habitat, sequence, treatment
ggplot(roach_wide_sum %>% filter(sequence!="Baseline")%>%
         group_by(sequence, treatment) %>%
         summarise(mean_c_hab = mean(c_hab_normalized),
                   se_c_hab = sd(c_hab_normalized) / sqrt(n())),
       aes(x = sequence, y = mean_c_hab, colour = treatment)) +
  geom_point(size = 3) +
  geom_errorbar(aes(ymin = mean_c_hab - se_c_hab, ymax = mean_c_hab + se_c_hab),
                width = 0.2)+theme_JN()+theme(text=element_text(size=20))

#pumping station, sequence, treatment
ggplot(roach_wide_sum %>% filter(sequence!="I 2")%>%
         group_by(sequence, treatment) %>%
         summarise(mean_c_hab = mean(c_ps_normalized),
                   se_c_hab = sd(c_ps_normalized) / sqrt(n())),
       aes(x = sequence, y = mean_c_hab, colour = treatment)) +
  geom_point(size = 3) +
  geom_errorbar(aes(ymin = mean_c_hab - se_c_hab, ymax = mean_c_hab + se_c_hab),
                width = 0.2)+theme_JN()+theme(text=element_text(size=20))

#open water, sequence, treatment
ggplot(roach_wide_sum %>%
         group_by(sequence, treatment) %>%
         summarise(mean_c_hab = mean(c_open_normalized),
                   se_c_hab = sd(c_open_normalized) / sqrt(n())),
       aes(x = sequence, y = mean_c_hab, colour = treatment)) +
  geom_point(size = 3) +
  geom_errorbar(aes(ymin = mean_c_hab - se_c_hab, ymax = mean_c_hab + se_c_hab),
                width = 0.2)+theme_JN()+theme(text=element_text(size=20))

#Artificial habitat, treatment, light
ggplot(roach_wide_sum %>% filter(sequence!="Baseline")%>%
         group_by(treatment, light) %>%
         summarise(mean_c_hab = mean(c_hab_normalized),
                   se_c_hab = sd(c_hab_normalized) / sqrt(n())),
       aes(x = treatment, y = mean_c_hab, colour = light)) +
  geom_point(size = 3) +
  geom_errorbar(aes(ymin = mean_c_hab - se_c_hab, ymax = mean_c_hab + se_c_hab),
                width = 0.2)+theme_JN()+theme(text=element_text(size=20))

#daytime counts highest in covered treatments

#pumping station, treatment, light
ggplot(roach_wide_sum %>% filter(sequence!="I 2")%>%
         group_by(treatment, light) %>%
         summarise(mean_c_hab = mean(c_ps_normalized),
                   se_c_hab = sd(c_ps_normalized) / sqrt(n())),
       aes(x = treatment, y = mean_c_hab, colour = light)) +
  geom_point(size = 3) +
  geom_errorbar(aes(ymin = mean_c_hab - se_c_hab, ymax = mean_c_hab + se_c_hab),
                width = 0.2)+theme_JN()+theme(text=element_text(size=20))

#open water, treatment, light
ggplot(roach_wide_sum %>%
         group_by(treatment, light) %>%
         summarise(mean_c_hab = mean(c_open_normalized),
                   se_c_hab = sd(c_open_normalized) / sqrt(n())),
       aes(x = treatment, y = mean_c_hab, colour = light)) +
  geom_point(size = 3) +
  geom_errorbar(aes(ymin = mean_c_hab - se_c_hab, ymax = mean_c_hab + se_c_hab),
                width = 0.2)+theme_JN()+theme(text=element_text(size=20))

#Artificial habitat, treatment (daytime)
ggplot(roach_wide_sum %>% filter(sequence!="Baseline")%>% filter(light=="Day")%>%
         group_by(treatment, light) %>%
         summarise(mean_c_hab = mean(c_hab_normalized),
                   se_c_hab = sd(c_hab_normalized) / sqrt(n())),
       aes(x = treatment, y = mean_c_hab, colour = light)) +
  geom_point(size = 3) +
  geom_errorbar(aes(ymin = mean_c_hab - se_c_hab, ymax = mean_c_hab + se_c_hab),
                width = 0.2)+theme_JN()+theme(text=element_text(size=20))

#Artificial habitat, sequence, treatment (daytime)
ggplot(roach_wide_sum %>% filter(sequence!="Baseline")%>% filter(light=="Day")%>%
         group_by(sequence, treatment) %>%
         summarise(mean_c_hab = mean(c_hab_normalized),
                   se_c_hab = sd(c_hab_normalized) / sqrt(n())),
       aes(x = sequence, y = mean_c_hab, colour = treatment)) +
  geom_point(size = 3) +
  geom_errorbar(aes(ymin = mean_c_hab - se_c_hab, ymax = mean_c_hab + se_c_hab),
                width = 0.2)+theme_JN()+theme(text=element_text(size=20))

In these plots we see that treatment has an impact on habitat occupancy and in all sequences, habitat occupancy in artificial habitat was highest in uncovered treatments.Interestingly, occupancy in pumping station shows the opposite relationship to artificial habitat. Pumping station occupancy was highest in uncovered treatments, suggesting that cover is important i.e., when uncovered habitat is provided, more fish occupy the pumping station. Overall, we found support for H5.

Before continuing with statistical analysis let’s examine repeated measures and temporal effects.

#Artificial habitat, trial (daytime)
ggplot(roach_wide %>% filter(sequence!="Baseline")%>% filter(light=="Day")%>%
         group_by(trial) %>%
         summarise(mean_c_hab = mean(c_hab_normalized),
                   se_c_hab = sd(c_hab_normalized) / sqrt(n())),
       aes(x = trial, y = mean_c_hab)) +
  geom_point(size = 3) +
  geom_errorbar(aes(ymin = mean_c_hab - se_c_hab, ymax = mean_c_hab + se_c_hab),
                width = 0.2)+theme_JN()+theme(text=element_text(size=20))

The variation between trials suggests significant effect of within-subject observations (i.e., repeated measures). Additionally, there appears to be temporal influence (ex. trial 1 compared to trial 24)

ggplot(roach_wide %>% filter(sequence!="Baseline")%>%
         group_by(time) %>%
         summarise(mean_c_hab = mean(c_hab_normalized),
                   se_c_hab = sd(c_hab_normalized) / sqrt(n())),
       aes(x = time, y = mean_c_hab)) +
  geom_point(size = 3) +
  geom_errorbar(aes(ymin = mean_c_hab - se_c_hab, ymax = mean_c_hab + se_c_hab),
                width = 0.2)+theme_JN()+theme(text=element_text(size=20))

Evidence for hour-to-hour variation. Ensure is included in analysis.

Finally, considering further relationships not required for main objectives.

ggplot(roach_wide %>%
         filter(!is.na(hours_havail))%>% filter(sequence =="I 1")%>%
         group_by(hours_havail) %>%
         summarise(mean_c_hab = mean(c_hab_normalized),
                   se_c_hab = sd(c_hab_normalized) / sqrt(n())),
       aes(x = hours_havail, y = mean_c_hab)) +
  geom_point(size = 3) +
  geom_errorbar(aes(ymin = mean_c_hab - se_c_hab, ymax = mean_c_hab + se_c_hab),
                width = 0.2)+theme_JN()+theme(text=element_text(size=20))

roach_wide %>%
  filter(!is.na(hours_havail)) %>%
  filter(sequence == "I 1") %>%
  mutate(hours = as.numeric(hours_havail)) %>%
  with(cor.test(hours, c_hab_normalized))
## 
##  Pearson's product-moment correlation
## 
## data:  hours and c_hab_normalized
## t = 4.8211, df = 1702, p-value = 1.555e-06
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.06896398 0.16266011
## sample estimates:
##       cor 
## 0.1160703

There is no real relationship. Occupancy peaks 24h after introduction, as expected due to nocturnal behavior. Correlation testing shows a weak positive correlation, but significant.

ggplot(roach_wide %>%
         filter(!is.na(hours_lout))%>%
         group_by(hours_lout) %>%
         summarise(mean_c_hab = mean(c_open_normalized),
                   se_c_hab = sd(c_open_normalized) / sqrt(n())),
       aes(x = hours_lout, y = mean_c_hab)) +
  geom_point(size = 3) +
  geom_errorbar(aes(ymin = mean_c_hab - se_c_hab, ymax = mean_c_hab + se_c_hab),
                width = 0.2)+theme_JN()+theme(text=element_text(size=20))

roach_wide %>%
  filter(!is.na(hours_lout)) %>%
  mutate(hours = as.numeric(hours_lout)) %>%
  with(cor.test(hours, c_open_normalized))
## 
##  Pearson's product-moment correlation
## 
## data:  hours and c_open_normalized
## t = 9.4592, df = 4606, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.1096056 0.1662548
## sample estimates:
##       cor 
## 0.1380431

Again, no strong relationship. Occupancy peaks within 1h of lights out. Correlation testing shows a weak positive correlation, but significant.

Ordinarily we may now choose to perform group comparisons etc on our variables of interest to support the relationships shown in the visual analysis. However, the nature of the problem is complex and thus is best treated by a modelling approach.

3 GLMM

Data exploration has shown:

  • Continuous non-negative response variable with zero values treated by adding 0.000000001
  • Left-skewed, Gamma distribution undesired
  • Gaussian distribution suitable for continuous response. Apply log-link to account for uneven variance in sequence groups.
  • Raw counts would be overdispersed, treated by rescaling.
  • No NAs
  • No outliers
  • Non-normally distributed response variable, but large sample size
  • No zeros in response
  • No continuous predictors, so no collinearity issues
  • No interactions
  • No Independence of response variables. Repeated measures design requires random effect of trial
  • Temporal dependency should be accounted for with random effects
table(roach_wide$sequence) 
## 
## Baseline      I 1      I 2      I 3 
##     1776     1728     1728     1710
table(roach_wide$treatment) 
## 
##   Covered (B) Uncovered (A) 
##          3471          3471
table(roach_wide$light)
## 
##   Day Night 
##  2616  4326

The data is well-balanced across the grouping variables except light, which is expected due to day night differences.

3.1 Data preperation

Let’s extract hour from the time variable and store as factor for random effect modelling.

roach_wide <- roach_wide %>%
  mutate(time_factor = factor(as.numeric(format(strptime(time, format = "%H:%M:%S"), "%H"))))

Create a new dataframe for binary modelling later.

#Create new DF for binary model
roach_binary <- roach_wide %>%filter (sequence=="I 1"|sequence=="I 3")%>%filter(light=="Day")
roach_binary$binary <- ifelse(roach_binary$sequence =="I 1", 0,1)

3.2 Build GLMM

3.2.1 Artifical habitat

Start by creating a null (intercept only) model.

mod1 <- glmmTMB(c_hab_normalized ~ 1, data = roach_wide%>%filter(sequence!="Baseline"),
                family = gaussian(link="log"))
summary(mod1)
##  Family: gaussian  ( log )
## Formula:          c_hab_normalized ~ 1
## Data: roach_wide %>% filter(sequence != "Baseline")
## 
##      AIC      BIC   logLik deviance df.resid 
##   4939.5   4952.6  -2467.8   4935.5     5164 
## 
## 
## Dispersion estimate for gaussian family (sigma^2): 0.152 
## 
## Conditional model:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -0.7596     0.0116  -65.47   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Intrepting the null model is not important here, but we can see that the estimated mean of our normalised count variable is significantly different from 0. We will use this null model for determining model fit by comparing subsequent models to the null using AIC selection.

Let’s add our fixed effects of interest.

mod1.1 <- update(mod1, c_hab_normalized ~ sequence + light + treatment)
summary(mod1.1)
##  Family: gaussian  ( log )
## Formula:          c_hab_normalized ~ sequence + light + treatment
## Data: roach_wide %>% filter(sequence != "Baseline")
## 
##      AIC      BIC   logLik deviance df.resid 
##   2422.0   2461.3  -1205.0   2410.0     5160 
## 
## 
## Dispersion estimate for gaussian family (sigma^2): 0.0934 
## 
## Conditional model:
##                        Estimate Std. Error z value Pr(>|z|)    
## (Intercept)            -1.20486    0.03538  -34.05  < 2e-16 ***
## sequenceI 2             1.12453    0.03607   31.18  < 2e-16 ***
## sequenceI 3             1.02292    0.03656   27.98  < 2e-16 ***
## lightNight             -0.62328    0.01728  -36.07  < 2e-16 ***
## treatmentUncovered (A) -0.08459    0.01619   -5.22 1.76e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The model outputs are sensible and have reduced the AIC of the null model from 4939 to 2422. All our predictor terms are significant with sensible coefficent and error estimates. For example, we can see that the mean value of c_hab_normalized is reduced by -0.623 units during night time light conditions.

It can be useful to plot model predictions early whilst building a complete model to understand how adding independent variables affects the estimated response.

plot(ggpredict(mod1.1, terms = c("sequence","light")))+theme_JN()+theme(text=element_text(size=20))

Here, we can see that the predicted habitat occupancy response is sensible, and close to the observed values we plotted earlier.

Importantly, random effects need to be added to account for repeated measures and temporal dependency.

mod1.2 <- update(mod1.1, . ~ . + (1 | time_factor/day/trial))
summary(mod1.2)
##  Family: gaussian  ( log )
## Formula:          
## c_hab_normalized ~ sequence + light + treatment + (1 | time_factor/day/trial)
## Data: roach_wide %>% filter(sequence != "Baseline")
## 
##      AIC      BIC   logLik deviance df.resid 
##   2401.6   2460.5  -1191.8   2383.6     5157 
## 
## Random effects:
## 
## Conditional model:
##  Groups                Name        Variance  Std.Dev.
##  trial:day:time_factor (Intercept) 1.001e-10 0.00001 
##  day:time_factor       (Intercept) 7.687e-03 0.08767 
##  time_factor           (Intercept) 9.906e-04 0.03147 
##  Residual                          9.123e-02 0.30204 
## Number of obs: 5166, groups:  
## trial:day:time_factor, 5166; day:time_factor, 216; time_factor, 24
## 
## Dispersion estimate for gaussian family (sigma^2): 0.0912 
## 
## Conditional model:
##                        Estimate Std. Error z value Pr(>|z|)    
## (Intercept)            -1.19015    0.03873 -30.731  < 2e-16 ***
## sequenceI 2             1.12143    0.03946  28.418  < 2e-16 ***
## sequenceI 3             0.98280    0.04094  24.007  < 2e-16 ***
## lightNight             -0.62157    0.02527 -24.600  < 2e-16 ***
## treatmentUncovered (A) -0.08580    0.01598  -5.369  7.9e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(ggpredict(mod1.2, terms = c("sequence","light")))+theme_JN()+theme(text=element_text(size=20))

As before, the model predictions are improved and the within-subject variation has been handled by adding the random effect of trial.

We can check the goodness of fit by plotting residuals using DHARMA.

fittedmod1.2 <- mod1.2
simuout1 <- simulateResiduals(fittedModel = fittedmod1.2)
plot(simuout1, quantreg = T)

The residuals generally follow a linear relationship and he deviation is accepted within a generalised framework. The quantile deviations present are likely a result of dispersion due to clumped values close to 0 and 1.

The model process is now repeated for the remaining response variables; pumping station and open water.

3.2.2 Pumping station

#Null model
mod2 <- glmmTMB(c_ps_normalized ~ 1, data = roach_wide%>%filter(sequence!="I 2"),
                family = gaussian(link="log"))
summary(mod2)
##  Family: gaussian  ( log )
## Formula:          c_ps_normalized ~ 1
## Data: roach_wide %>% filter(sequence != "I 2")
## 
##      AIC      BIC   logLik deviance df.resid 
##   6784.9   6798.0  -3390.4   6780.9     5212 
## 
## 
## Dispersion estimate for gaussian family (sigma^2): 0.215 
## 
## Conditional model:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -0.73106    0.01334  -54.81   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#Add fixed effects
mod2.1 <- update(mod2, c_ps_normalized ~ sequence + light + treatment)
summary(mod2.1)
##  Family: gaussian  ( log )
## Formula:          c_ps_normalized ~ sequence + light + treatment
## Data: roach_wide %>% filter(sequence != "I 2")
## 
##      AIC      BIC   logLik deviance df.resid 
##   4372.6   4411.9  -2180.3   4360.6     5208 
## 
## 
## Dispersion estimate for gaussian family (sigma^2): 0.135 
## 
## Conditional model:
##                        Estimate Std. Error z value Pr(>|z|)    
## (Intercept)            -0.20719    0.01776 -11.665  < 2e-16 ***
## sequenceI 1            -0.10587    0.01847  -5.733 9.88e-09 ***
## sequenceI 3            -2.01448    0.09481 -21.248  < 2e-16 ***
## lightNight             -0.29107    0.01827 -15.932  < 2e-16 ***
## treatmentUncovered (A)  0.05181    0.01827   2.836  0.00457 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#Add random effects
mod2.2 <- update(mod2.1, . ~ . + (1 | time_factor/day/trial))
summary(mod2.2)
##  Family: gaussian  ( log )
## Formula:          
## c_ps_normalized ~ sequence + light + treatment + (1 | time_factor/day/trial)
## Data: roach_wide %>% filter(sequence != "I 2")
## 
##      AIC      BIC   logLik deviance df.resid 
##   4218.1   4277.1  -2100.1   4200.1     5205 
## 
## Random effects:
## 
## Conditional model:
##  Groups                Name        Variance  Std.Dev. 
##  trial:day:time_factor (Intercept) 9.025e-02 3.004e-01
##  day:time_factor       (Intercept) 2.876e-11 5.363e-06
##  time_factor           (Intercept) 3.499e-13 5.915e-07
##  Residual                          1.061e-01 3.257e-01
## Number of obs: 5214, groups:  
## trial:day:time_factor, 5214; day:time_factor, 218; time_factor, 24
## 
## Dispersion estimate for gaussian family (sigma^2): 0.106 
## 
## Conditional model:
##                        Estimate Std. Error z value Pr(>|z|)    
## (Intercept)            -0.26812    0.01981 -13.536  < 2e-16 ***
## sequenceI 1            -0.09148    0.01932  -4.736 2.18e-06 ***
## sequenceI 3            -2.05188    0.08633 -23.768  < 2e-16 ***
## lightNight             -0.25369    0.01920 -13.212  < 2e-16 ***
## treatmentUncovered (A)  0.04263    0.01915   2.227    0.026 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(ggpredict(mod2.2, terms = c("sequence","light")))+theme_JN()+theme(text=element_text(size=20))

3.2.3 Open water

#Null model
mod3 <- glmmTMB(c_open_normalized ~ 1, data = roach_wide,
                family = gaussian(link="log"))
summary(mod3)
##  Family: gaussian  ( log )
## Formula:          c_open_normalized ~ 1
## Data: roach_wide
## 
##      AIC      BIC   logLik deviance df.resid 
##   5520.5   5534.2  -2758.3   5516.5     6940 
## 
## 
## Dispersion estimate for gaussian family (sigma^2): 0.13 
## 
## Conditional model:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -1.24761    0.01505  -82.92   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#Add fixed effects
mod3.1 <- update(mod3, c_open_normalized ~ sequence + light + treatment)
summary(mod3.1)
##  Family: gaussian  ( log )
## Formula:          c_open_normalized ~ sequence + light + treatment
## Data: roach_wide
## 
##      AIC      BIC   logLik deviance df.resid 
##   3544.6   3592.6  -1765.3   3530.6     6935 
## 
## 
## Dispersion estimate for gaussian family (sigma^2): 0.0974 
## 
## Conditional model:
##                        Estimate Std. Error z value Pr(>|z|)    
## (Intercept)            -2.47386    0.07071  -34.99  < 2e-16 ***
## sequenceI 1            -0.54735    0.04770  -11.48  < 2e-16 ***
## sequenceI 2             0.21837    0.03055    7.15 8.86e-13 ***
## sequenceI 3             0.28354    0.02989    9.49  < 2e-16 ***
## lightNight              1.49549    0.06583   22.72  < 2e-16 ***
## treatmentUncovered (A)  0.08195    0.02220    3.69 0.000223 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#Add random effects
mod3.2 <- update(mod3.1, . ~ . + (1 | time_factor/day/trial))
summary(mod3.2)
##  Family: gaussian  ( log )
## Formula:          
## c_open_normalized ~ sequence + light + treatment + (1 | time_factor/day/trial)
## Data: roach_wide
## 
##      AIC      BIC   logLik deviance df.resid 
##   3162.7   3231.2  -1571.4   3142.7     6932 
## 
## Random effects:
## 
## Conditional model:
##  Groups                Name        Variance  Std.Dev. 
##  trial:day:time_factor (Intercept) 3.340e-01 5.780e-01
##  day:time_factor       (Intercept) 2.883e-11 5.369e-06
##  time_factor           (Intercept) 9.907e-16 3.147e-08
##  Residual                          5.889e-02 2.427e-01
## Number of obs: 6942, groups:  
## trial:day:time_factor, 6942; day:time_factor, 290; time_factor, 24
## 
## Dispersion estimate for gaussian family (sigma^2): 0.0589 
## 
## Conditional model:
##                        Estimate Std. Error z value Pr(>|z|)    
## (Intercept)            -2.39186    0.05561  -43.01  < 2e-16 ***
## sequenceI 1            -0.65043    0.04127  -15.76  < 2e-16 ***
## sequenceI 2             0.09886    0.03416    2.89 0.003804 ** 
## sequenceI 3             0.12975    0.03364    3.86 0.000115 ***
## lightNight              1.38818    0.04682   29.65  < 2e-16 ***
## treatmentUncovered (A)  0.03996    0.02421    1.65 0.098784 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(ggpredict(mod3.2, terms = c("sequence","light")))+theme_JN()+theme(text=element_text(size=20))

3.2.4 Binary probability model

To separately analyse the effect of habitat exclusion on overall probability of fish occupying a given habitat post-exclusion (sequence: I 3) let’s build a pair of GLMMs in glmmTMB with binomial distribution (link = logit). In these models, fixed effects are limited to experimental sequence (I 1, I 3) and used the same random effects specified in the main models.

mod_binary_ah <- glmmTMB(c_hab_normalized ~ as.factor(binary) + (1 | time_factor/day/trial), data = roach_binary, family = binomial())
summary(mod_binary_ah)
##  Family: binomial  ( logit )
## Formula:          
## c_hab_normalized ~ as.factor(binary) + (1 | time_factor/day/trial)
## Data: roach_binary
## 
##      AIC      BIC   logLik deviance df.resid 
##   1258.3   1284.1   -624.2   1248.3     1273 
## 
## Random effects:
## 
## Conditional model:
##  Groups                Name        Variance  Std.Dev. 
##  trial:day:time_factor (Intercept) 1.903e-09 4.363e-05
##  day:time_factor       (Intercept) 6.201e-02 2.490e-01
##  time_factor           (Intercept) 7.335e-28 2.708e-14
## Number of obs: 1278, groups:  
## trial:day:time_factor, 1278; day:time_factor, 60; time_factor, 10
## 
## Conditional model:
##                    Estimate Std. Error z value Pr(>|z|)    
## (Intercept)         -1.1126     0.1031  -10.79   <2e-16 ***
## as.factor(binary)1   2.9091     0.1643   17.71   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The model predicts a significant increase in the probability of fish occupying artificial habitat post-exclusion.

plot(ggpredict(mod_binary_ah, terms = c("binary")))+theme_JN()+theme(text=element_text(size=20))

mod_binary_ps <- glmmTMB(c_ps_normalized ~ as.factor(binary) + (1 | time_factor/day/trial), data = roach_binary, family = binomial())
summary(mod_binary_ps)
##  Family: binomial  ( logit )
## Formula:          
## c_ps_normalized ~ as.factor(binary) + (1 | time_factor/day/trial)
## Data: roach_binary
## 
##      AIC      BIC   logLik deviance df.resid 
##   1127.6   1153.3   -558.8   1117.6     1273 
## 
## Random effects:
## 
## Conditional model:
##  Groups                Name        Variance  Std.Dev. 
##  trial:day:time_factor (Intercept) 2.877e-09 5.364e-05
##  day:time_factor       (Intercept) 4.603e-12 2.145e-06
##  time_factor           (Intercept) 1.232e-20 1.110e-10
## Number of obs: 1278, groups:  
## trial:day:time_factor, 1278; day:time_factor, 60; time_factor, 10
## 
## Conditional model:
##                    Estimate Std. Error z value Pr(>|z|)    
## (Intercept)         1.05317    0.08971   11.74   <2e-16 ***
## as.factor(binary)1 -3.38536    0.16652  -20.33   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The model predicts a significant decrease in the probability of fish occupying pumping station post-exclusion.

plot(ggpredict(mod_binary_ps, terms = c("binary")))+theme_JN()+theme(text=element_text(size=20))

3.3 Save model predictions

We will save the individual model predictions using ggpredict and then combine and restructure the dataframe using dplyr from the tidyverse.

c_hab_glmmm <-ggpredict(mod1.2, terms = c("sequence", "light"))
c_ps_glmmm <-ggpredict(mod2.2, terms = c("sequence", "light"))
c_open_glmmm <-ggpredict(mod3.2, terms = c("sequence", "light"))
c_hab_glmmm_treat <-ggpredict(mod1.2, terms = c("sequence", "treatment", "light[Day]"))

Next, a grouping variable for each habitat type is added. A second variable ‘present’ is added to indicate if data is present for the experimental sequence.

c_hab_glmmm <- c_hab_glmmm %>% mutate(habitat = "Artificial habitat", present = "T")
c_ps_glmmm <- c_ps_glmmm %>% mutate(habitat = "Pumping station", present = "T")
c_open_glmmm <- c_open_glmmm %>% mutate(habitat = "Open water", present = "T")

The dataframes are then combined and restructured for plotting. First, the rows are bound then NA values are omitted and dataframe attributes are removed (for simplicity, not necessity). Dummy rows are added for the sequence groups without data, and variables are then converted to factors with specific levels.

modeloutput_df <- bind_rows(c_hab_glmmm, c_ps_glmmm, c_open_glmmm)
modeloutput_df <- na.omit(modeloutput_df)
modeloutput_df <- as.data.frame(unclass(modeloutput_df))
modeloutput_df <- modeloutput_df %>%
  add_row(x = 'Baseline', group = 'Day', habitat = 'Artificial habitat', present = "F") %>%
  add_row(x = 'Baseline', group = 'Night', habitat = 'Artificial habitat', present = "F") %>%
  add_row(x = 'I 2', group = 'Day', habitat = 'Pumping station', present = "F") %>%
  add_row(x = 'I 2', group = 'Night', habitat = 'Pumping station', present = "F")
modeloutput_df$x <- factor(modeloutput_df$x, levels = c('Baseline', 'I 1', 'I 2', 'I 3'))
modeloutput_df$habitat <- as.factor(modeloutput_df$habitat)
modeloutput_df$habitat <- factor(modeloutput_df$habitat, levels = c('Pumping station', 'Open water', 'Artificial habitat'))
x predicted std.error conf.low conf.high group habitat present
I 1 0.3041765 0.0387274 0.2819427 0.3281637 Day Artificial habitat T
I 1 0.1633742 0.0409847 0.1507639 0.1770393 Night Artificial habitat T
I 2 0.9335923 0.0222946 0.8936759 0.9752916 Day Artificial habitat T
I 2 0.5014354 0.0227331 0.4795839 0.5242825 Night Artificial habitat T
I 3 0.8127345 0.0240701 0.7752828 0.8519954 Day Artificial habitat T
I 3 0.4365223 0.0238856 0.4165575 0.4574439 Night Artificial habitat T
Baseline 0.7648180 0.0198081 0.7356943 0.7950946 Day Pumping station T
Baseline 0.5934450 0.0191550 0.5715783 0.6161482 Night Pumping station T
I 1 0.6979559 0.0205494 0.6704035 0.7266407 Day Pumping station T
I 1 0.5415647 0.0197917 0.5209591 0.5629853 Night Pumping station T
I 3 0.0982743 0.0863840 0.0829678 0.1164046 Day Pumping station T
I 3 0.0762539 0.0867094 0.0643361 0.0903794 Night Pumping station T
Baseline 0.0914594 0.0556065 0.0820155 0.1019907 Day Open water T
Baseline 0.3665273 0.0286108 0.3465395 0.3876679 Night Open water T
I 1 0.0477254 0.0596329 0.0424610 0.0536426 Day Open water T
I 1 0.1912617 0.0373060 0.1777760 0.2057704 Night Open water T
I 2 0.1009631 0.0519537 0.0911884 0.1117855 Day Open water T
I 2 0.4046138 0.0272225 0.3835914 0.4267883 Night Open water T
I 3 0.1041301 0.0506727 0.0942852 0.1150029 Day Open water T
I 3 0.4173058 0.0265197 0.3961693 0.4395699 Night Open water T
Baseline NA NA NA NA Day Artificial habitat F
Baseline NA NA NA NA Night Artificial habitat F
I 2 NA NA NA NA Day Pumping station F
I 2 NA NA NA NA Night Pumping station F

A similar process follows for the treatment dataframe.

c_hab_glmmm_treat <- as.data.frame(unclass(c_hab_glmmm_treat))
c_hab_glmmm_treat <- c_hab_glmmm_treat %>% mutate(present = "T")
c_hab_glmmm_treat <- c_hab_glmmm_treat %>% select(-facet)
c_hab_glmmm_treat <- c_hab_glmmm_treat %>%
  add_row(x = 'Baseline', group = 'Covered (B)', present = "F") %>%
  add_row(x = 'Baseline', group = 'Uncovered (A)', present = "F")
c_hab_glmmm_treat$x <- factor(c_hab_glmmm_treat$x, levels = c('Baseline', 'I 1', 'I 2', 'I 3'))
c_hab_glmmm_treat$group <- factor(c_hab_glmmm_treat$group, levels = c('Uncovered (A)', 'Covered (B)'))
x predicted std.error conf.low conf.high group present
I 1 0.3041765 0.0387274 0.2819427 0.3281637 Covered (B) T
I 1 0.2791664 0.0388745 0.2586861 0.3012682 Uncovered (A) T
I 2 0.9335923 0.0222946 0.8936759 0.9752916 Covered (B) T
I 2 0.8568302 0.0228311 0.8193338 0.8960425 Uncovered (A) T
I 3 0.8127345 0.0240701 0.7752828 0.8519954 Covered (B) T
I 3 0.7459096 0.0245659 0.7108463 0.7827025 Uncovered (A) T
Baseline NA NA NA NA Covered (B) F
Baseline NA NA NA NA Uncovered (A) F

The binary model predictions are then saved. The grouping variable is modified to allow for easy plotting.

ah_prob <- ggpredict(mod_binary_ah, terms = c("binary"))
ps_prob <- ggpredict(mod_binary_ps, terms = c("binary"))

ps_prob$group <- 2 
ps_prob$group <- factor(ps_prob$group)

habitat_prob_df <- bind_rows(ah_prob, ps_prob)
x predicted std.error conf.low conf.high group
0 0.2473919 0.1030883 0.2117147 0.2868931 1
1 0.8577281 0.1261066 0.8248246 0.8853107 1
0 0.7413837 0.0897146 0.7062698 0.7736453 2
1 0.0884921 0.1402805 0.0686808 0.1133226 2

4 Model output figures

The final model outputs can now be plotted. The following figures are finalized versions with some edits made after exporting from R, thus the code snippet will not produce an identical figure.

ggplot(data=modeloutput_df, aes(x=x, y=predicted, fill=group))+
  geom_tile(aes(x=x, y=0.5,height = Inf, fill=present), alpha = 0.3,  show.legend = FALSE) + 
  scale_fill_manual(values = c("T" = "grey80", "F" = "grey90"), guide = FALSE) +
  geom_errorbar(aes(ymin = conf.low, ymax = conf.high), size=0.5,width = 0.6 ,position = position_dodge(width = 0.7),  show.legend = FALSE) +
  #geom_path(aes(group = interaction(habitat, group), linetype = group), linewidth = 0.3 ,position = position_dodge(width = 0.7), show.legend = FALSE) +
  scale_linetype_manual(values = c("Day" = "dashed", "Night" = "dotted"))+
  geom_point(aes(shape=group),size=2, position = position_dodge(width = 0.7),  show.legend = FALSE) +
  scale_shape_manual(values = c("Day" = 20, "Night" = 4))+
  scale_y_continuous(breaks = seq(0, 1,0.1), limits=c(0,1),expand=c(0.05,0)) +
  scale_x_discrete(expand=c(0,0))+
  labs(x = 'Experimental sequence',y= 'Habitat occupancy')+
  theme_JN()+
  theme(axis.text.x=element_text(size=8),
        strip.background = element_rect(fill = "grey90"),
        panel.spacing.x =unit(0, "lines") ) + 
  facet_grid(~ habitat, scales = "fixed")+
  geom_text(data = modeloutput_df %>% filter(present == "F"), aes(x = x, y = 0.5, label = "Unavailable"), size = 8/.pt, angle = 90, fontface = "italic")
Fig 1. Habitat occupancy fitted by Generalised Linear Mixed Models (log-linked Gaussian distributions) across each response variable a) pumping station, b) open water, c) artificial habitat. All models incorporated the random effect of time, day, and trial. Error bars represent 95% confidence intervals. Values fitted by light period, day = dots and dashed lines, night = crosses and dotted lines. Light grey background shading denoted by ’unavailable’ indicates where habitat was not available in the experimental sequence. Models fitted using glmmTMB in R 4.3.1.
Fig 1. Habitat occupancy fitted by Generalised Linear Mixed Models (log-linked Gaussian distributions) across each response variable a) pumping station, b) open water, c) artificial habitat. All models incorporated the random effect of time, day, and trial. Error bars represent 95% confidence intervals. Values fitted by light period, day = dots and dashed lines, night = crosses and dotted lines. Light grey background shading denoted by ’unavailable’ indicates where habitat was not available in the experimental sequence. Models fitted using glmmTMB in R 4.3.1.
ggplot(data=c_hab_glmmm_treat, aes(x=x, y=predicted, fill=group))+
  geom_tile(aes(x=x, y=0.5,height = Inf, fill=present), alpha = 0.3,  show.legend = FALSE) + 
  scale_fill_manual(values = c("T" = NA, "F" = "grey90"), guide = FALSE) +
  geom_errorbar(aes(ymin = conf.low, ymax = conf.high), size=0.5,width = 0.6 ,position = position_dodge(width = 0.7),  show.legend = FALSE) +
  geom_point(aes(shape=group),size=1.5, position = position_dodge(width = 0.7),  show.legend = FALSE) +
  scale_shape_manual(values = c("Covered (B)" = 20, "Uncovered (A)" = 4))+
  scale_y_continuous(breaks = seq(0, 1,0.1), limits=c(0,1),expand=c(0.05,0)) +
  scale_x_discrete(expand=c(0,0))+
  labs(x = 'Experimental sequence',y= 'Daytime artifical habitat occupancy')+
  coord_cartesian(clip="off")+
  theme_JN()+
  theme(axis.text.x=element_text(size=8)) + 
  geom_text(data = c_hab_glmmm_treat %>% filter(present == "F"), aes(x = x, y = 0.5, label = "Unavailable"), size = 8/.pt, angle = 90, fontface = "italic")
Fig 2. Daytime artificial habitat occupancy fitted by Generalised Linear Mixed Model (log-linked Gaussian distribution). The model incorporated the random effect of time, day, and trial. Error bars represent 95% confidence intervals. Values fitted by habitat treatment, unsheltered (A) = crosses, sheltered (B) = dots. Light grey background shading denoted by ’unavailable’ indicates where habitat was not available in the experimental sequence. Model fitted using glmmTMB in R 4.3.1.
Fig 2. Daytime artificial habitat occupancy fitted by Generalised Linear Mixed Model (log-linked Gaussian distribution). The model incorporated the random effect of time, day, and trial. Error bars represent 95% confidence intervals. Values fitted by habitat treatment, unsheltered (A) = crosses, sheltered (B) = dots. Light grey background shading denoted by ’unavailable’ indicates where habitat was not available in the experimental sequence. Model fitted using glmmTMB in R 4.3.1.
ggplot(data=habitat_prob_df, aes(x = factor(x, labels = c("Pre-exclusion", "Post-exclusion")), y=predicted, fill=group))+
  geom_errorbar(aes(ymin = conf.low, ymax = conf.high), width=.2) +
  geom_line(aes(group=group,linetype = group),  show.legend = FALSE)+
  geom_point(aes(shape=group),size=2,  show.legend = FALSE) +
  scale_shape_manual(values = c("1" = 20, "2" = 4))+
  scale_linetype_manual(values = c("1" = "dashed", "2" = "dotted"))+
  scale_y_continuous(breaks = seq(0, 1, 0.1), limits = c(0, 1), expand = c(0.05, 0), 
                     labels = scales::percent(seq(0, 1, 0.1), scale = 100)) +
  scale_x_discrete()+
  labs(x = 'Pumping station exclusion',y=expression(atop(NA, atop(textstyle('Predicted probabaility of'), textstyle('daytime habitat occupancy')))))+
  coord_cartesian(clip="off")+
  theme_JN()+
  theme(axis.text.x=element_text(size=8))
Fig 3. The predicted probability of fish occupying a habitat during daytime, fitted by Generalised Linear Mixed Models (logit-linked binomial distributions). The model incorporated the random effect of time, day, and trial. Error bars represent 95% confidence intervals. Values fitted by habitat type, artificial habitat = dots and dashed lines, pumping station = crosses and dotted lines. Models fitted using glmmTMB in R 4.3.1.
Fig 3. The predicted probability of fish occupying a habitat during daytime, fitted by Generalised Linear Mixed Models (logit-linked binomial distributions). The model incorporated the random effect of time, day, and trial. Error bars represent 95% confidence intervals. Values fitted by habitat type, artificial habitat = dots and dashed lines, pumping station = crosses and dotted lines. Models fitted using glmmTMB in R 4.3.1.

5. Post-hoc analysis

Post-hoc analysis can now be performed to support the model outputs. Tests of relevance here include paired t.tests and repeated measures ANOVA.

#AH
anov1<- aov(c_hab_normalized ~ sequence + Error(trial), data = roach_wide%>%filter(sequence!="Baseline"))
summary(anov1)
## 
## Error: trial
##           Df Sum Sq Mean Sq F value  Pr(>F)   
## sequence   1  43.70   43.70   9.683 0.00508 **
## Residuals 22  99.29    4.51                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Error: Within
##             Df Sum Sq Mean Sq F value Pr(>F)    
## sequence     2  171.6   85.78   934.6 <2e-16 ***
## Residuals 5140  471.8    0.09                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#AH
anov2<- aov(c_ps_normalized ~ sequence + Error(trial), data = roach_wide%>%filter(sequence!="I 2"))
summary(anov2)
## 
## Error: trial
##           Df Sum Sq Mean Sq F value  Pr(>F)   
## sequence   1  126.0  125.97   13.15 0.00149 **
## Residuals 22  210.8    9.58                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Error: Within
##             Df Sum Sq Mean Sq F value Pr(>F)    
## sequence     2  380.5  190.24    2446 <2e-16 ***
## Residuals 5188  403.5    0.08                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#OW
anov3<- aov(c_open_normalized ~ sequence + Error(trial), data = roach_wide)
summary(anov3)
## 
## Error: trial
##           Df Sum Sq Mean Sq F value Pr(>F)  
## sequence   1  15.24  15.239   3.013 0.0966 .
## Residuals 22 111.27   5.058                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Error: Within
##             Df Sum Sq Mean Sq F value Pr(>F)    
## sequence     3   49.5  16.487   157.5 <2e-16 ***
## Residuals 6915  723.8   0.105                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1